Make corpus determinism check toolchain-agnostic (#2299)#2377
Merged
Conversation
brendancol
commented
May 25, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Make corpus determinism check toolchain-agnostic
Blockers
None.
Suggestions
_assert_semantic_equalonly compares base-IFD pixels even when both files have an overview chain (test_corpus_determinism.py:143-154). For the COG fixture, an overview-pixel regression in the generator would slip past: the decimation factor list matches but the overview pixels never get read. Consider iterating the overview levels for lossless cells withrasterio.open(path, OVERVIEW_LEVEL=lvl)and comparing each level's pixel array._oracle.pyalready uses that pattern incompare_to_oracle.- Lossy JPEG cells skip pixel comparison entirely (line 147-148). A pixel-content regression in the generator's JPEG branch (wrong input, wrong band order before encode) would not be caught here. A coarse band-wise mean check within a loose tolerance would close the gap without re-introducing libjpeg coupling.
Nits
_is_encoder_sensitive,_assert_semantic_equal, and_nodata_equalhave no direct unit tests. The drift-detection paths were verified manually in the PR; a tiny test that builds a doctored copy with a flipped pixel and asserts_assert_semantic_equalraises would lock the negative path in.- The tmp-dir name in
regenerated_diris still"regen_corpus_1930"(line 178). Harmless but misleading now that the test's scope was extended in #2299.
What looks good
- Dispatch is driven from manifest fields (
cog,compression,tolerance.lossy), not hard-coded fixture ids, so new encoder-sensitive cells inherit the right comparison mode automatically. - The semantic check covers dtype, shape, transform, CRS, nodata (NaN-aware via
_nodata_equal), overview decimation chain, and lossless pixels. - The workflow comment explains why the
--ignoreis gone instead of leaving a silent loosening.
Checklist
- Algorithm matches reference: no algorithm; test-infra change
- All implemented backends produce consistent results: rasterio-only
- NaN handling is correct:
_nodata_equal+equal_nan=True - Edge cases: missing-fixture skip preserved
- Overview-level pixels exercised: not yet (see suggestion)
- Dask chunk boundaries: n/a
- No premature materialization: n/a
- Benchmark: n/a
- README feature matrix: n/a
- Docstrings present and accurate
COG and JPEG fixtures fall back to a semantic rasterio-read comparison instead of byte-md5; the rest still md5-compare. The conda-forge workflow drops its --ignore now that the test handles encoder drift on its own.
…2299) - Lossless cells now also compare every overview level's pixel array, closing the gap where the COG fixture's pyramid could drift unnoticed. - Lossy (JPEG) cells gain a coarse per-band mean tolerance so a content regression in the JPEG branch is still detected. - Add direct negative-path tests that doctor a fixture and assert _assert_semantic_equal rejects it. - Rename the regenerated_dir tmp prefix to reflect the test's scope.
brendancol
commented
May 25, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review
All four findings from the prior review pass were addressed in f5aa2141:
- Suggestion 1 (overview-level pixels):
_assert_semantic_equalnow iteratesrasterio.open(..., OVERVIEW_LEVEL=lvl)for every level the file declares and compares each level's pixel array (lossless cells). See_assert_overview_pixels_exactat test_corpus_determinism.py:194. - Suggestion 2 (lossy mean tolerance):
_assert_pixels_close_lossyat test_corpus_determinism.py:209 computes per-band means and rejects drift above_LOSSY_PIXEL_MEAN_TOL = 4.0(0-255 scale). - Nit 1 (drift unit tests): two new tests cover the negative path:
test_semantic_equal_rejects_lossless_pixel_driftandtest_semantic_equal_rejects_lossy_mean_drift(test_corpus_determinism.py:357 and :371). - Nit 2 (tmp-dir rename):
tmp_path_factory.mktemp("regen_corpus_determinism")at test_corpus_determinism.py:256.
The branch was rebased onto origin/main after the follow-up so the PR diff is now scoped to the two #2299 files; force-push was clean.
Local run: pytest xrspatial/geotiff/tests/golden_corpus/: 96 passed, 1 skipped.
No new blockers, suggestions, or nits from this pass.
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
test_corpus_determinism.pypicks the comparison method per fixture. Byte-md5 for fixtures that are stable across GDAL / libjpeg builds; a semantic rasterio-read comparison (dtype, shape, transform, CRS, nodata, overview chain, pixels) forcog: trueandcompression: jpegcells where the encoder output is toolchain-coupled.tolerance.lossy: truein the manifest, currently the JPEG-YCbCr fixture) skip the pixel-equality check.pytest-geotiff-corpusworkflow drops its--ignore=test_corpus_determinism.pyflag now that the test is no longer toolchain-coupled.Closes #2299.
Test plan
pytest xrspatial/geotiff/tests/golden_corpus/test_corpus_determinism.py: 33 passed, 1 skipped.pytest xrspatial/geotiff/tests/golden_corpus/: 94 passed, 1 skipped._assert_semantic_equal.pytest-geotiff-corpusworkflow goes green on PR with the COG and JPEG fixtures in scope.